feat(vtuber): add OpenAI-compatible adapter by canyugs · Pull Request #1234 · openabdev/openab

canyugs · 2026-06-28T15:52:03Z

What problem does this solve?

Desktop character apps such as AniCompanion, Open-LLM-VTuber, and ChatVRM already speak OpenAI chat completions, but they usually connect to a raw LLM. This PR lets those skins point at OpenAB instead, so the same UI gets ACP-backed tool use, code editing, memory, MCP, and existing OpenAB steering.

Closes #1233

Discord Discussion URL: https://discord.com/channels/1491295327620169908/1520790210320011274

Architecture

The VTuber adapter now runs inside the unified OpenAB binary:

Skin (AniCompanion / Open-LLM-VTuber / ChatVRM)
  |
  |-- POST /v1/chat/completions  (stream:true, Bearer key)  <- Tier-1 SSE
  |     choices[].delta.content, including inline [emotion] tags
  |
  |-- GET /v1/vtuber/ws  (Bearer key, optional)             <- Tier-2 WS
        agent_state / emotion / tool_status / notification
  |
OpenAB unified binary
  |-- crates/openab-gateway/src/adapters/vtuber.rs
  |-- src/unified_adapter.rs
  |-- src/acp -> coding agent (codex / claude / kiro)

No separate gateway process or adapter config block is required for VTuber in unified mode. Set VTUBER_ENABLED=true on the OpenAB process, and the unified HTTP listener exposes the OpenAI-compatible endpoint.

Proposed Solution

Tier-1: OpenAI-Compatible SSE

POST /v1/chat/completions streams OpenAI-compatible chat.completion.chunk events.
messages[] is forwarded to the configured ACP agent with no adapter-added steering.
Inline [emotion] tags pass through for existing skins that already parse and strip them before TTS.
Each request creates one OpenAB session; the skin keeps conversation history in messages[].

Tier-2: Optional WebSocket Side Channel

GET /v1/vtuber/ws pushes structured UI events without affecting OpenAI compatibility.
Server events include agent_state, emotion, tool_status, and notification.
Clients can send subscribe to filter event categories and ping for keepalive.
Auth uses Authorization: Bearer <VTUBER_AUTH_KEY> or ?token=.

Why this approach?

Zero client changes for skins that already support OpenAI-compatible chat completions.
Unified binary deployment matches the current OpenAB platform-adapter model.
Tier-2 remains additive: clients that do not connect still get complete Tier-1 chat.
JSON-over-WebSocket with a type discriminator matches common avatar tooling patterns and keeps protocol integration simple.

Validation

cargo check -p openab --features unified
cargo test -p openab has_unified_platform_env_checks --features unified
cargo test -p openab-gateway vtuber — 12 VTuber tests passed

Notes

docs/vtuber.md now documents unified-mode setup only.
docs/config-reference.md lists the VTuber unified environment variables.
The legacy standalone gateway crate still hosts the reusable adapter module, but the VTuber route is wired into src/main.rs for unified OpenAB deployments.

Expose POST /v1/chat/completions (SSE) backed by the OAB agent, so any OpenAI-compatible character skin (AniCompanion, Open-LLM-VTuber, …) gets a real agent with zero client changes. messages[] is flattened into the agent prompt; the agent's streamed reply is re-emitted as OpenAI chat.completion.chunk deltas via a per-request channel.id registry drained by the /ws recv loop. Inline [emotion] tags pass through untouched. Tier 1 of RFC openabdev#1233. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NXngQyvmJwQPNYsiRUNU2m

smallgun01 · 2026-06-28T16:06:28Z

Additional: Frontend WebSocket Client Demo

Added examples/vtuber-demo/index.html — a minimal WebSocket client reference implementation for VTuber skins.

What it does:

Connects to OAB Gateway via raw WebSocket (ws://host:8080/ws?token=)
Sends openab.gateway.event.v1 (text messages)
Receives openab.gateway.reply.v1 (streaming reply with cursor animation)
Dark theme UI, settings persisted in localStorage

Usage:

Open index.html in browser, configure WS URL and token, connect and chat.

Note:

Currently the gateway's handle_oab_connection does not forward GatewayEvents from WS clients to OAB core — backend update needed for full functionality.

Code:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>OAB VTuber Client</title>
  <style>
    * { box-sizing: border-box; margin: 0; padding: 0; }
    body { font-family: monospace; background: #1a1a2e; color: #e0e0e0; display: flex; flex-direction: column; height: 100vh; }
    #header { padding: 12px 16px; background: #16213e; border-bottom: 1px solid #0f3460; display: flex; align-items: center; gap: 12px; }
    #header h1 { font-size: 16px; color: #e94560; }
    #status { font-size: 12px; padding: 2px 8px; border-radius: 4px; }
    .disconnected { background: #555; }
    .connected { background: #2d6a4f; }
    .error { background: #9b2226; }
    #config { padding: 8px 16px; background: #16213e; border-bottom: 1px solid #0f3460; display: flex; gap: 8px; flex-wrap: wrap; align-items: center; font-size: 12px; }
    #config input { background: #0f3460; border: 1px solid #533483; color: #e0e0e0; padding: 4px 8px; border-radius: 4px; font-family: monospace; font-size: 12px; }
    #config button { background: #533483; border: none; color: #e0e0e0; padding: 4px 12px; border-radius: 4px; cursor: pointer; font-size: 12px; }
    #config button:hover { background: #e94560; }
    #chat { flex: 1; overflow-y: auto; padding: 16px; display: flex; flex-direction: column; gap: 8px; }
    .msg { padding: 8px 12px; border-radius: 6px; max-width: 80%; white-space: pre-wrap; word-break: break-word; font-size: 14px; line-height: 1.5; }
    .msg.user { background: #533483; align-self: flex-end; }
    .msg.assistant { background: #0f3460; align-self: flex-start; }
    .msg.system { background: #2d2d2d; align-self: center; font-size: 11px; color: #888; }
    .msg.error { background: #9b2226; align-self: center; font-size: 12px; }
    #input-area { padding: 12px 16px; background: #16213e; border-top: 1px solid #0f3460; display: flex; gap: 8px; }
    #msg-input { flex: 1; background: #0f3460; border: 1px solid #533483; color: #e0e0e0; padding: 8px 12px; border-radius: 6px; font-family: monospace; font-size: 14px; resize: none; }
    #send-btn { background: #e94560; border: none; color: #fff; padding: 8px 20px; border-radius: 6px; cursor: pointer; font-size: 14px; font-weight: bold; }
    #send-btn:hover { background: #c81d4e; }
    #send-btn:disabled { background: #555; cursor: not-allowed; }
    .cursor { display: inline-block; width: 8px; height: 14px; background: #e94560; animation: blink 0.8s infinite; vertical-align: text-bottom; }
    @keyframes blink { 0%, 100% { opacity: 1; } 50% { opacity: 0; } }
  </style>
</head>
<body>
  <div id="header">
    <h1>OAB VTuber Client</h1>
    <span id="status" class="disconnected">Disconnected</span>
  </div>
  <div id="config">
    <label>WS: <input type="text" id="ws-url" size="40"></label>
    <label>Token: <input type="password" id="ws-token" placeholder="API key" size="20"></label>
    <button id="connect-btn" onclick="toggleConnection()">Connect</button>
  </div>
  <div id="chat"></div>
  <div id="input-area">
    <textarea id="msg-input" rows="2" placeholder="Type a message..." onkeydown="handleKey(event)"></textarea>
    <button id="send-btn" onclick="sendMessage()" disabled>Send</button>
  </div>

  <script>
    const STORAGE_KEY = 'oab-vtuber-config';
    let ws = null;
    let currentAssistantEl = null;
    let cursorEl = null;

    const chatEl = document.getElementById('chat');
    const msgInput = document.getElementById('msg-input');
    const sendBtn = document.getElementById('send-btn');
    const connectBtn = document.getElementById('connect-btn');
    const statusEl = document.getElementById('status');
    const wsUrlInput = document.getElementById('ws-url');
    const wsTokenInput = document.getElementById('ws-token');

    function loadConfig() {
      try {
        const saved = JSON.parse(localStorage.getItem(STORAGE_KEY));
        if (saved) {
          wsUrlInput.value = saved.wsUrl || 'ws://localhost:8080/ws';
          wsTokenInput.value = saved.token || '';
        }
      } catch {
        wsUrlInput.value = 'ws://localhost:8080/ws';
      }
    }

    function saveConfig() {
      localStorage.setItem(STORAGE_KEY, JSON.stringify({
        wsUrl: wsUrlInput.value.trim(),
        token: wsTokenInput.value.trim()
      }));
    }

    loadConfig();

    function addMessage(text, role) {
      const el = document.createElement('div');
      el.className = `msg ${role}`;
      el.textContent = text;
      chatEl.appendChild(el);
      chatEl.scrollTop = chatEl.scrollHeight;
      return el;
    }

    function showCursor(parentEl) {
      removeCursor();
      cursorEl = document.createElement('span');
      cursorEl.className = 'cursor';
      parentEl.appendChild(cursorEl);
    }

    function removeCursor() {
      if (cursorEl) {
        cursorEl.remove();
        cursorEl = null;
      }
    }

    function setStatus(state, text) {
      statusEl.className = state;
      statusEl.textContent = text;
    }

    function toggleConnection() {
      if (ws && ws.readyState === WebSocket.OPEN) {
        ws.close();
      } else {
        connectWS();
      }
    }

    function connectWS() {
      const baseUrl = wsUrlInput.value.trim();
      const token = wsTokenInput.value.trim();
      if (!baseUrl) return addMessage('Please enter WebSocket URL', 'error');

      saveConfig();
      const url = token ? `${baseUrl}?token=${encodeURIComponent(token)}` : baseUrl;
      addMessage(`Connecting to ${url}...`, 'system');

      try {
        ws = new WebSocket(url);
      } catch (e) {
        addMessage(`Connection error: ${e.message}`, 'error');
        return;
      }

      ws.onopen = () => {
        setStatus('connected', 'Connected');
        connectBtn.textContent = 'Disconnect';
        sendBtn.disabled = false;
        addMessage('Connected to OAB Gateway', 'system');
      };

      ws.onmessage = (event) => {
        try {
          const reply = JSON.parse(event.data);

          if (reply.schema === 'openab.gateway.reply.v1') {
            const content = reply.content;
            if (content && content.type === 'text' && content.text) {
              if (!currentAssistantEl) {
                currentAssistantEl = addMessage('', 'assistant');
              }
              currentAssistantEl.textContent += content.text;
              showCursor(currentAssistantEl);
              chatEl.scrollTop = chatEl.scrollHeight;
            }

            if (reply.done) {
              removeCursor();
              currentAssistantEl = null;
            }
          } else if (reply.type === 'done' || reply.done === true) {
            removeCursor();
            currentAssistantEl = null;
          } else {
            addMessage(`[Unknown reply] ${event.data}`, 'system');
          }
        } catch {
          if (!currentAssistantEl) {
            currentAssistantEl = addMessage('', 'assistant');
          }
          currentAssistantEl.textContent += event.data;
          showCursor(currentAssistantEl);
          chatEl.scrollTop = chatEl.scrollHeight;
        }
      };

      ws.onclose = () => {
        setStatus('disconnected', 'Disconnected');
        connectBtn.textContent = 'Connect';
        sendBtn.disabled = true;
        removeCursor();
        currentAssistantEl = null;
        addMessage('Disconnected', 'system');
      };

      ws.onerror = () => {
        setStatus('error', 'Error');
        addMessage('WebSocket error', 'error');
      };
    }

    function sendMessage() {
      const text = msgInput.value.trim();
      if (!text || !ws || ws.readyState !== WebSocket.OPEN) return;

      addMessage(text, 'user');
      currentAssistantEl = null;
      removeCursor();

      const payload = {
        schema: 'openab.gateway.event.v1',
        platform: 'vtuber',
        content: {
          type: 'text',
          text: text
        }
      };

      ws.send(JSON.stringify(payload));
      msgInput.value = '';
    }

    function handleKey(e) {
      if (e.key === 'Enter' && !e.shiftKey) {
        e.preventDefault();
        sendMessage();
      }
    }
  </script>
</body>
</html>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

canyugs · 2026-06-28T17:49:13Z

Tier-1 complete ✅

All validation items checked:

Unit tests (config, flatten_messages, delta_suffix incl. multibyte) — 257 passed
Full e2e against real gateway binary (fake agent)
Real ACP agent e2e (kiro-cli driving live LLM)
Cloud deployment e2e (Zeabur, Tencent Tokyo) — curl -sN with Bearer auth → streamed OpenAI deltas → [DONE]
AniCompanion e2e (macOS VRM app) — zero code changes, pointed at cloud gateway, sent messages and received replies from claude-agent-acp

CI: all checks and smoke tests passing.

Next: Tier-2

Tier-2 RFC opened as #1235 — WebSocket side-channel (/v1/vtuber/ws) for agent-state push, tool visibility, emotion, and ambient notifications. Design is informed by prior art from Open-LLM-VTuber, clawd-on-desk, and VTube Studio API.

Tier-2 adds GET /v1/vtuber/ws — a persistent WebSocket that pushes agent_state, emotion, and notification events derived from GatewayReply commands. VTuber skins connect once and receive real-time state updates (thinking/working/idle, tool usage, emotion tags) without polling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NXngQyvmJwQPNYsiRUNU2m

canyugs · 2026-06-28T20:02:52Z

Tier-2 WebSocket event stream pushed (acd3e38). Pending: e2e testing with a live VTuber skin.

Replace Vec<WsClient> with HashMap<u64, WsClient> and an AtomicU64 counter. The old Vec+index scheme broke when broadcast() called swap_remove on dead clients — surviving clients' stored indices became stale, routing subscribe/pong to the wrong connection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NXngQyvmJwQPNYsiRUNU2m

F2: Cap in-flight /v1/chat/completions at 32 by checking vtuber_pending size before accepting — returns 429 when full. F3: Emit SSE comment `: waiting for agent` after 10s of silence, giving clients an early signal before the 180s hard timeout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NXngQyvmJwQPNYsiRUNU2m

canyugs · 2026-06-29T02:18:26Z

Review findings addressed:

F1 (client index bug): Fixed in c154dc1 — Vec<WsClient> + index → HashMap<u64, WsClient> + AtomicU64. No more stale index after swap_remove.
F2 (rate limit): Fixed in 50c252e — cap at 32 in-flight requests via vtuber_pending.len() check, returns 429.
F3 (idle warning): Fixed in 50c252e — SSE comment : waiting for agent emitted after 10s of silence, before the 180s hard timeout.
F1 (duplicate AppState): Deferred — this pattern is shared across all adapters; refactoring serve() to delegate to AppState::from_env() should be a separate PR.
F3 (emotion intensity): Deferred — hardcoded 1.0 is sufficient until a skin needs variable intensity ([tag:0.8] format).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NXngQyvmJwQPNYsiRUNU2m

canyugs · 2026-06-29T04:11:15Z

All findings resolved, clippy CI fixed in c1b9495. Ready for merge — pending e2e testing with a live VTuber skin.

thepagent · 2026-06-29T12:34:01Z

+[gateway]
+url = "ws://openab-gateway:8080/ws"
+platform = "vtuber"
+streaming = true
+streaming_placeholder = false   # required: avoids the "…" placeholder ambiguity


I think it's not required anymore

thepagent · 2026-06-29T12:34:12Z

+command = "kiro-cli"
+args = ["acp", "--trust-all-tools"]
+working_dir = "/home/agent"


not required

thepagent · 2026-06-29T12:34:42Z

+
+## References
+
+- [ADR: Custom Gateway](adr/custom-gateway.md)


let's remove this ref, we are deprecating custom gateway.

thepagent · 2026-06-29T12:35:23Z

+```
+Skin ──POST /v1/chat/completions (SSE)──▶ Gateway (:8080) ◀──WebSocket── OAB Pod
+       choices[].delta.content (incl. inline [emotion] tags)   (OAB connects out)
+```


update this, we are running single binary now.

thepagent · 2026-06-29T12:36:24Z

+            #[cfg(feature = "vtuber")]
+            vtuber: None,
+            #[cfg(feature = "vtuber")]
+            vtuber_pending: Arc::new(Mutex::new(HashMap::new())),
+            #[cfg(feature = "vtuber")]
+            vtuber_ws_clients: None,


depends on

refactor(gateway): decouple adapter tests via AppState::test_default() #1241

we sill simplify this

# Conflicts: # crates/openab-gateway/src/adapters/line.rs # crates/openab-gateway/src/adapters/teams.rs

chaodu-agent · 2026-06-30T05:55:03Z

CHANGES REQUESTED ⚠️ — Solid two-tier VTuber adapter with prior Vec-index bug properly fixed, but missing the in-flight rate limit that was present in earlier iterations and still using non-constant-time auth comparison.

What This PR Does

Desktop character apps (AniCompanion, Open-LLM-VTuber, ChatVRM) that speak OpenAI chat completions can now point at OAB and get a full ACP-backed agent (tool use, code, MCP, memory) with zero client changes. Tier-1 provides the OpenAI-compatible SSE endpoint; Tier-2 adds an optional WebSocket side-channel pushing structured agent-state, emotion, and notification events.

How It Works

Tier-1: POST /v1/chat/completions accepts messages[], flattens into a prompt, dispatches a GatewayEvent to the connected ACP agent via the internal event bus, and streams back chat.completion.chunk SSE deltas via unfold-based state machine. Tail-idle timeout (configurable via VTUBER_REPLY_TAIL_IDLE_MS, default 1500ms) cleanly finalizes the stream after the agent stops sending snapshots.

Tier-2: GET /v1/vtuber/ws pushes agent_state, emotion, tool_status, and notification events derived from GatewayReply commands (reaction emojis → state mapping, [tag] extraction). Clients can subscribe to filter event categories and ping for keepalive. Optional ambient notification loop via env vars.

Feature-gated under #[cfg(feature = "vtuber")], integrated into both openab-gateway::serve and the unified binary's main, following the exact same pattern as existing adapters.

Findings

#	Severity	Finding	Location
1	🟡	In-flight request cap (previously added as 32-request limit) is missing from `chat_completions` — any client can exhaust memory with unbounded pending requests	`vtuber.rs:380-430`
2	🟡	Bearer token comparison uses `!=` (non-constant-time) — timing side-channel risk for auth key extraction	`vtuber.rs:247,395`
3	🟢	WS client tracking properly fixed — `HashMap<u64, WsClient>` with `AtomicU64` eliminates the prior `swap_remove` index invalidation bug	`vtuber.rs:95-100`
4	🟢	Comprehensive test suite — 12 integration + unit tests covering lifecycle, subscribe filtering, auth rejection, ping/pong, and idle stream behavior	`vtuber.rs` tests
5	🟢	Clean architecture — Tier-1 and Tier-2 are independently usable, ambient loop is opt-in with 60s minimum guard	—
6	🟢	Excellent documentation with env var reference, troubleshooting, and architecture diagram	`docs/vtuber.md`

Finding Details

🟡 F1: Missing in-flight request cap

In previous iterations, chat_completions checked vtuber_pending.lock().await.len() >= 32 before accepting a new request and returned 429 when exceeded. This guard is absent in the current commit — vtuber_pending grows unboundedly.

Without this cap, a misbehaving or malicious client can spawn unlimited sessions, each holding an mpsc channel in memory until the 180s timeout, leading to OOM under load.

Suggested fix: Restore the in-flight cap:

let pending = state.vtuber_pending.lock().await;
if pending.len() >= 32 {
    return (StatusCode::TOO_MANY_REQUESTS, "too many in-flight requests").into_response();
}

Or make it configurable via VTUBER_MAX_INFLIGHT.

🟡 F2: Timing side-channel in auth comparison

if provided != Some(expected.as_str()) { ... }
if token != Some(expected.as_str()) { ... }

Standard != short-circuits on first differing byte, leaking key length and prefix information over repeated requests. For a bearer key protecting agent access, use subtle::ConstantTimeEq or compare fixed-length HMACs:

use subtle::ConstantTimeEq;
let ok = provided
    .map(|p| p.as_bytes().ct_eq(expected.as_bytes()).into())
    .unwrap_or(false);
if !ok { return 401; }

This is defense-in-depth — practical exploitability depends on network jitter, but it's standard practice for auth token comparison.

Baseline Check

PR opened: 2026-06-28
Author: canyugs
Main already has: unified binary with Telegram, LINE, Feishu, Google Chat, WeCom, Teams adapters — no vtuber code
Net-new value: Entirely new OpenAI-compatible adapter enabling VTuber/character skin frontends as OAB clients (Tier-1 SSE + Tier-2 WebSocket)
Prior review rounds: Vec-index bug fixed, rate limit previously added but now missing

What's Good (🟢)

Zero-change integration: AniCompanion validated with no client modifications, confirmed on cloud deployments (Zeabur, Tencent)
HashMap client registry: Proper fix using AtomicU64 IDs eliminates the swap_remove index invalidation class of bugs
Comprehensive test coverage: Unit tests for config, flatten, delta, reaction mapping, emotion extraction + integration tests for full WS lifecycle, subscribe filtering, auth rejection, and stream idle semantics
Tail-idle pattern: Elegant handling of progressive reply snapshots — closes cleanly after no new content arrives
Feature gating: Compiles out cleanly, minimal diff in existing files (~50 lines mechanical integration)
Ambient notifications: Opt-in, minimum-interval-guarded (60s), skips broadcast when no clients connected
Documentation: docs/vtuber.md is thorough — prerequisites, setup, troubleshooting, env var table, architecture

Addressing External Reviewer Feedback

@smallgun01

Frontend WebSocket Client Demo (examples/vtuber-demo/index.html) — connects to gateway via raw WebSocket

ℹ️ Noted: The proposed demo uses a different protocol path (raw gateway /ws) than what this adapter implements (OpenAI HTTP for Tier-1, dedicated /v1/vtuber/ws for Tier-2). Best suited as a separate PR or updated to use the Tier-2 WebSocket endpoint.

canyugs marked this pull request as ready for review June 28, 2026 17:33

canyugs requested a review from thepagent as a code owner June 28, 2026 17:33

Copilot AI review requested due to automatic review settings June 28, 2026 17:33

Copilot AI reviewed Jun 28, 2026

canyugs mentioned this pull request Jun 28, 2026

RFC: VTuber adapter Tier-2 — WebSocket agent-state push #1235

Open

canyugs requested a review from Copilot June 28, 2026 17:48

Copilot AI reviewed Jun 28, 2026

github-actions Bot added the pending-contributor label Jun 28, 2026